Extraction of Protein Sequence Motifs Information by Bi-Clustering Algorithm
نویسندگان
چکیده
The activities and function of proteins can potentially be determined by protein sequence motifs. Therefore, obtaining the universally conserved and crossed protein family boundaries protein sequence motifs is crucial. In this study, a fuzzy C-means and an improved K-means clustering algorithm are applied to granulize the entire dataset and analyze each granular respectively. In addition, a modified bi-clustering algorithm is employed to improve clusters' quality. This is the first time bi-clustering algorithm is implemented for clusters extraction proposes. By comparing with the traditional shrink method, the modified bi-clustering algorithm generates more clusters with secondary structure similarity greater than 60% at the same data filtering percentage. Moreover, bi-clustering algorithm is shown to have the ability to select meaningful amino acids that biologists are interested at.
منابع مشابه
Extraction of Motif Patterns from Protein Sequences Using K-Means with segment pruning methods
Bioinformatics is the application of information technology to the management of molecular biological data. Motif finding in protein sequence is one of the most crucial tasks in bioinformatics research. Motifs are identifying as overly recurring sub-patterns in segment of protein sequence biological data. Sequence motifs are verifying by their structural similarities or their functional roles i...
متن کاملInnovative Algorithms and Evaluation Methods for Biological Motif Finding
Biological motifs are defined as overly recurring sub-patterns in biological systems. Sequence motifs and network motifs are the examples of biological motifs. Due to the wide range of applications, many algorithms and computational tools have been developed for efficient search for biological motifs. Therefore, there are more computationally derived motifs than experimentally validated motifs,...
متن کاملProtein Sequence Motif Information Generated by Fuzzy - Hybrid Hierarchical K-means Clustering Algorithm
Recurring amino acids sequence patterns are referred to as protein sequence motifs. The recurring patterns are so important because the conserved regions have the potential to reveal the role of the protein itself. In this paper, we modify the FGK model and apply the Hybrid Hierarchical K-means (HHK) clustering algorithm, which is a hybrid combination of Agglomerative Hierarchical Clustering an...
متن کاملNew Seed Selection Technique for Protein Sequeunce Motif Identification
Bioinformatics is a field devoted to the interpretation and analysis of biological data using computational techniques. In recent years the study of bioinformatics has grown tremendously due to huge amount of biological information generated by the scientific community. Protein sequence motifs are short fragments of conserved amino acids often associated with specific function. Identifying such...
متن کاملRepeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010